26 resultados para deviance information criteria, model averaging, MCMC, genomewide association studies, epistasis, logistic regression, stochastic search algorithm, case-control studies, Type I diabetes, single nucleotide polymorphism, gene expression programming

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a "cosmopolitan" tagging approach to capture the genetic diversity across approximately 2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The type 2 diabetes (diabetes) pandemic is recognized as a threat to tuberculosis (TB) control worldwide. This secondary data analysis project estimated the contribution of diabetes to TB in a binational community on the Texas-Mexico border where both diseases occur. Newly-diagnosed TB patients > 20 years of age were prospectively enrolled at Texas-Mexico border clinics between January 2006 and November 2008. Upon enrollment, information regarding social, demographic, and medical risks for TB was collected at interview, including self-reported diabetes. In addition, self-reported diabetes was supported by blood-confirmation according to guidelines published by the American Diabetes Association (ADA). For this project, data was compared to existing statistics for TB incidence and diabetes prevalence from the corresponding general populations of each study site to estimate the relative and attributable risks of diabetes to TB. In concordance with historical sociodemographic data provided for TB patients with self-reported diabetes, our TB patients with diabetes also lacked the risk factors traditionally associated with TB (alcohol abuse, drug abuse, history of incarceration, and HIV infection); instead, the majority of our TB patients with diabetes were characterized by overweight/obesity, chronic hyperglycemia, and older median age. In addition, diabetes prevalence among our TB patients was significantly higher than in the corresponding general populations. Findings of this study will help accurately characterize TB patients with diabetes, thus aiding in the timely recognition and diagnosis of TB in a population not traditionally viewed as at-risk. We provide epidemiological and biological evidence that diabetes continues to be an increasingly important risk factor for TB.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: The objective of this study is to investigate the association between processed and unprocessed red meat consumption and prostate cancer (PCa) stage in a homogenous Mexican-American population. Methods: This population-based case-control study had a total of 582 participants (287 cases with histologically confirmed adenocarcinoma of the prostate gland and 295 age and ethnicity-matched controls) that were all residing in the Southeast region of Texas from 1998 to 2006. All questionnaire information was collected using a validated data collection instrument. Statistical Analysis: Descriptive analyses included Student's t-test and Pearson's Chi-square tests. Odds ratios and 95% confidence intervals were calculated to quantify the association between nutritional factors and PCa stage. A multivariable model was used for unconditional logistic regression. Results: After adjusting for relevant covariates, those who consume high amounts of processed red meat have a non-significant increased odds of being diagnosed with localized PCa (OR = 1.60 95% CI: 0.85 - 3.03) and total PCa (OR = 1.43 95% CI: 0.81 - 2.52) but not for advanced PCa (OR = 0.91 95% CI: 1.37 - 2.23). Interestingly, high consumption of carbohydrates shows a significant reduction in the odds of being diagnosed with total PCa and advanced PCa (OR = 0.43 95% CI: 0.24 - 0.77; OR = 0.27 95% CI: 0.10 - 0.71, respectively). However, consuming high amounts of energy from protein and fat was shown to increase the odds of being diagnosed with advanced PCa (OR = 4.62 95% CI: 1.69 - 12.59; OR = 2.61 95% CI: 1.04 - 6.58, respectively). Conclusion: Mexican-Americans who consume high amounts of energy from protein and fat had increased odds of being diagnosed with advanced PCa, while high amounts of carbohydrates reduced the odds of being diagnosed with total and advanced PCa.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In June 1995 a case-control study was initiated by the Texas Department of Health among Mexican American women residing in the fourteen counties of the Texas-Mexico border. Case-women had carried infants with neural tube defect. Control-women had given birth to infants without neural tube defects. The case-control protocol included a general questionnaire which elicited information regarding illnesses experienced and antibiotics taken from three months prior to conception to three months after conception. An assessment of the associations between periconceptional diarrhea and the risk of neural tube defects indicated that the unadjusted association of diarrhea and risk of neural tube defect was significant (OR = 3.3, CI = 1.4–7.6). The unadjusted association of use of oral antimicrobials and risk of neural tube defect was also significant (OR = 3.4, CI = 1.6–7.3). These associations persisted among women who had no fever during the periconceptional period and were present irrespective of folate intake. Diarrhea was associated with an increased risk of NTD independent of use of antimicrobials. The converse was also true; antimicrobials were associated with an increased risk of NTD independent of diarrhea. Further research regarding these potentially modifiable risk factors is warranted. Replication of these findings could result in interventions in addition to folate supplementation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Meningomyelocele (MM) results from lack of closure of the neural tube during embryologic development. Periconceptional folic acid supplementation is a modifier of MM risk in humans, leading toan interest in the folate transport genes as potential candidates for association to MM. METHODS: This study used the SNPlex Genotyping (ABI, Foster City, CA) platform to genotype 20 single polymorphic variants across the folate receptor genes (FOLR1, FOLR2, FOLR3) and the folate carrier gene (SLC19A1) to assess their association to MM. The study population included 329 trio and 281 duo families. Only cases with MM were included. Genetic association was assessed using the transmission disequilibrium test in PLINK. RESULTS: A variant in the FOLR2 gene (rs13908), three linked variants in the FOLR3 gene (rs7925545, rs7926875, rs7926987), and two variants in the SLC19A1 gene (rs1888530 and rs3788200) were statistically significant for association to MM in our population. CONCLUSION: This study involved the analyses of selected single nucleotide polymorphisms across the folate receptor genes and the folate carrier gene in a large population sample. It provided evidence that the rare alleles of specific single nucleotide polymorphisms within these genes appear to be statistically significant for association to MM in the patient population that was tested.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using a human terato-carcinoma cell line, PA-1, the functional role of the oncogenes and tumor suppressor gene involved in the multistep process of carcinogenesis have been analyzed. The expression of AP-2 was strongly correlated with the susceptibility to ras transformation. The differential responsiveness to growth factors between stage 1 ras resistant cells and stage 2 ras susceptible cells was observed, indicating that the ability of stage 2 cells to respond to the mutated ras oncogenes in transformation correlated with the ability to be stimulated by certain growth factors. Using differential screening of cDNA libraries, a number of differentially expressed cDNA clones was isolated. One of those, clone 12, is overexpressed in ras transformed stage 3 cells. The amino acid sequence of clone 12 is almost identical to a mouse LLrep3 gene that was growth-regulated, and 78% similar to a yeast ribosomal protein S4. These results suggest that the S4 gene may be involved in regulation of growth. Clone 9 is expressed in stage 1 ras resistant cells (3.5-kb and 3.0-kb transcripts) but the expression of this clone in stage 2 ras susceptible cells and stage 3 ras-transformed cells is greatly diminished. The expression of this cDNA clone was increased to at least five fold in ras resistant cells and nontumorigenic hybrids treated with retinoic acid but not increased in retinoic acid treated ras susceptible cells, ras transformed cells and the tumorigenic segregants. Partial sequence of this clone showed no homology to the sequences in Genbank. These findings suggest that clone 9 could be a suppressor gene or the genes that are involved in the biochemical pathway of tumor suppression or neurogenic differentiation. The apparent pleiotropic effect of the loss of this suppressor gene function support Harris' proposal that tumor suppressor genes regulate differentiation. The tumor suppressor gene may act as negative regulator of tumor growth by controlling gene expression in differentiation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Breast cancer is the most common cancer in women in the United States and is a leading cause of cancer-related deaths (1). Recently, dietary heterocyclic amines (HCAs) have been proposed to be a risk factor for breast cancer (2). This study uses the data collected for a case-control study conducted at the M.D. Anderson Cancer Center to assess the association between breast cancer risk and HCAs {2-amino-1-methyl-6-phenylimidazole [4,5-b] pyridine (PhIP), 2-amino-3,8-dimethylimidazo [4,5-f] quinoxaline (MeIQx), 2-amino-3,4,8-trimethylimidazo [4,5-f] quinoxaline (DiMeIQx) and mutagenicity of HCAs} and to examine if this association is modified by genetic polymorphisms of N-acetyl transferases (NAT1/NAT2). The NAT1/2 genotype was determined using Taqman technology. HCAs were estimated by using a meat preparation questionnaire on meat type, cooking method, and doneness, combined with a quantitative HCA database. Three hundred and fifty patients with breast cancer attending the Diagnostic Radiology Clinic at M. D. Anderson Cancer Center and fulfilling the eligibility criteria were compared to three hundred and fifty patients attending the same clinic for benign breast lesions to answer these questions. Logistic regression models were used to control for known risk factors and showed no statistically significant association between breast cancer versus benign breast cancer lesions and dietary intake of heterocyclic amines. There was no clear difference in their effect after subgroup analyses in different acetylator strata of NAT1/2 and no statistical interactions were found between NAT1/2 genotypes and HCAs, suggesting no effect modification by NAT1/2 acetylator status. These results suggest the need for further research to analyze if these null associations were because of the benign breast lesions sharing the risk factors with breast cancer or any other factors which haven't been explored yet.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several studies have examined the association between high glycemic index (GI) and glycemic load (GL) diets and the risk for coronary heart disease (CHD). However, most of these studies were conducted primarily on white populations. The primary aim of this study was to examine whether high GI and GL diets are associated with increased risk for developing CHD in whites and African Americans, non-diabetics and diabetics, and within stratifications of body mass index (BMI) and hypertension (HTN). Baseline and 17-year follow-up data from ARIC (Atherosclerosis Risk in Communities) study was used. The study population (13,051) consisted of 74% whites, 26% African Americans, 89% non-diabetics, 11% diabetics, 43% male, 57% female aged 44 to 66 years at baseline. Data from the ARIC food frequency questionnaire at baseline were analyzed to provide GI and GL indices for each subject. Increases of 25 and 30 units for GI and GL respectively were used to describe relationships on incident CHD risk. Adjusted hazard ratios for propensity score with 95% confidence intervals (CI) were used to assess associations. During 17 years of follow-up (1987 to 2004), 1,683 cases of CHD was recorded. Glycemic index was associated with 2.12 fold (95% CI: 1.05, 4.30) increased incident CHD risk for all African Americans and GL was associated with 1.14 fold (95% CI: 1.04, 1.25) increased CHD risk for all whites. In addition, GL was also an important CHD risk factor for white non-diabetics (HR=1.59; 95% CI: 1.33, 1.90). Furthermore, within stratum of BMI 23.0 to 29.9 in non-diabetics, GI was associated with an increased hazard ratio of 11.99 (95% CI: 2.31, 62.18) for CHD in African Americans, and GL was associated with 1.23 fold (1.08, 1.39) increased CHD risk in whites. Body mass index modified the effect of GI and GL on CHD risk in all whites and white non-diabetics. For HTN, both systolic blood pressure and diastolic blood pressure modified the effect on GI and GL on CHD risk in all whites and African Americans, white and African American non-diabetics, and white diabetics. Further studies should examine other factors that could influence the effects of GI and GL on CHD risk, including dietary factors, physical activity, and diet-gene interactions. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The cornea of the human eye can develop deposits of lipids in the periphery known as corneal arcus. [2, 10] For over a century, these deposits have been of interest as possible indicators of the accumulation of lipids in arterial walls of the heart and body with implications for heart disease. [2, 10, 11, 29] Heart disease is currently the leading cause of death in this country. [5, 29] There have been several publications suggesting an association between the development of atherosclerotic lesions and corneal arcus. [2, 12, 29] Investigators have differed in their interpretation of the relevance of corneal arcus to coronary heart disease or cardiovascular disease. However, there is widespread consensus that the presence of corneal arcus in patients under the age of 50 should prompt physicians to further investigate for dyslipidemia or heart disease. [2, 3, 6, 8, 19] Earlier studies have often suffered from difficulty in determining the presence or severity of atherosclerosis and from inconsistencies in evaluating corneal arcus. This study involves the review of mortality data, medical and social history and standardized slit lamp examination of corneal tissue donors to evaluate the prevalence of corneal arcus in relation to death by CHD or CVD. The prevalence of arcus, odds ratio, and logistic regression was utilized for statistical analysis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The incidence of OSCC in younger population and in those who never smoked or drank has increased since the last decade. This increase may be attributable to increase of infection with HPV. The pro-inflammatory cytokine TNF-&agr; has the role in the pathogenesis of chronic inflammatory diseases and was found to control HPV infection in cervical cancer studies. Our study aimed to investigate the association between the four polymorphisms located in TNF-&agr; promoter region, -308(rs1800629), -857(rs1799724), -863(rs1800630) and -1031(rs1799964), and the risk of HPV-related OSCC. In this hospital-based case-control study, 325 cases and 335 controls were included. We found that HPV 16 seropositivity was associated with an increased risk of oral cancer (OR = 3.1, 95% CI, 2.1–4.6). Each of the polymorphism showed to increase the risk of HPV-related OSCC. And after combining the risk genotypes and using the low-risk group (0–1 combined risk genotypes) and HPV16 seronegativity as the reference group, only the high-risk groups (3–4 combined risk genotypes) and HPV16 seronegativity were associated with a low OR of 1.8 (95% CI, 1.1–2.8), while the low-risk and high-risk groups and HPV16 seropositivity were significantly associated with a higher OR of 2.7 (95% CI, 1.3–5.8) and 8.5 (95% CI, 3.7–19.4), respectively. In addition, the joint effects were greater among the young subjects (aged<50), males, never smokers or never drinkers, and patients with oropharyngeal cancer. Overall, the four TNF-&agr; polymorphisms, individually or collectively, would result in a significantly increased risk for HPV16-associated oral cancer in a non-Hispanic white population. More large sized studies are needed for future investigation.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ventricular system is a critical component of the central nervous system (CNS) that is formed early in the developmental stages and remains functional through the lifetime. Changes in the ventricular system can be easily discerned via neuroimaging procedures and most of the time it reflects changes in the physiology of the CNS. In this study we attempted to identify specific genes associated with variation in ventricular volume in humans. Methods. We conducted a genome wide association (GWA) analysis of the volume of the lateral ventricles among 1605 individuals of European ancestry from two community based cohorts, the Genetics of Microangiopathic Brain Injury (GMBI; N=814) and Atherosclerosis Risk in Communities (ARIC; N=791). Significant findings from the analysis were tested for replication in both the cohorts and then meta-analyzed to get an estimate of overall significance. Results. In our GWA analyses, no single nucleotide polymorphism (SNP) reached a genome-wide significance of p<10−8. There were 25 SNPs in GMBI and 9 SNPs in ARIC that reached a threshold of p<10 −5. However, none of the top SNPs from each cohort were replicated in the other. In the meta-analysis, no SNP reached the genome-wide threshold of 5×10−8, but we identified five novel SNPs associated with variation in ventricular volume at the p<10 −5 level. Strongest association was for rs2112536 in an intergenic region on chromosome 5q33 (Pmeta= 8.46×10−7 ). The remaining four SNPs were located on chromosome 3q23 encompassing the gene for Calsyntenin-2 (CLSTN2). The SNPs with strongest association in this region were rs17338555 (Pmeta= 5.28×10 −6), rs9812091 (Pmeta= 5.89×10−6 ), rs9812283 (Pmeta= 5.97×10−6) and rs9833213 (Pmeta= 6.96×10−6). Conclusions. This GWA study of ventricular volumes in the community-based cohorts of European descent identifies potential locus on chromosomes 3 and 5. Further characterization of these loci may provide insights into pathophysiology of ventricular involvement in various neurological diseases.^